Building a custom error page with PHP and Apache requires two steps. You need to tell Apache to run a PHP program when it encounters a 404 ("Page Not Found") error. And you need to write the corresponding program that takes the appropriate action.
Configuring ApacheErrorDocument
directive:
ErrorDocument 404 /error-404.php
This tells Apache to serve up error-404.php
in the document root directory when it encounters a 404 error. The ErrorDocument
directive can go in Apache's httpd.conf file, but it also works in .htaccess
files in individual directories. You can have a site-wide error-handling page
or different error-handling pages for different parts of your site. Apache also
sets some server variables that the error-handling page can access:
REDIRECT_URL
: the URL-path that was not found. If a user asks
for the nonexistent page http://www.example.com/lunch/pastrami.html,
for example, this variable is set to /lunch/pastrami.html
.
REDIRECT_STATUS
: the HTTP response status resulting from the request
for the original page. In our case, this is always "404". You can use ErrorDocument
with other status codes, though, so if you have one error-handling page for multiple
statuses, you can use this variable to determine which error status caused the
error-handling page to be loaded.REDIRECT_ERROR_NOTES
: a brief description of what went wrong,
for example, "File does not exist: /usr/local/apache/docroot/lunch/pastrami.html".
REDIRECT_REQUEST_METHOD
: the method of the request for the original
page, such as GET
or POST
.If there is a query string in the original request, it is stored in REDIRECT_QUERY_STRING
.
The error page does not have access to the GET
or POST
variables via $_GET
, $_POST
, or $_REQUEST
,
but cookie variables are still available in $_COOKIE
.
These REDIRECT
variables are available in the PHP superglobal
array $_SERVER
: $_SERVER['REDIRECT_URL']
, $_SERVER['REDIRECT_STATUS']
,
and so forth.
REDIRECT
variables can be used to
do many different things in response to a request for a nonexistent page. If your
site has been recently reorganized, you can transparently redirect users to the
new URL that corresponds to a particular old URL:
<?php
$map = array('/old/1' => '/new/2.html',
'/old/2' => '/new/3.html');
if (isset($map[$_SERVER['REDIRECT_URL']])) {
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$map[$_SERVER['REDIRECT_URL']];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
A redirect response needs to include the query string in the
redirect URL if the query string was present in the original request. Redirects
always use the GET
method. Including the query string preserves any
GET
variables from the original request, but POST
data
is lost.
Additionally, the protocol and host name need to be at the beginning of the
redirect URL sent with the Location header. This example hardcodes "http" as the
protocol and gets the host name from the HTTP_HOST
server variable.
To work transparently under https as well as http, your code should test for the
presence of $_SERVER['HTTPS']
. If this variable is set to "on", then
the protocol should be "https" instead of "http".
Basic redirection could also be accomplished with a list of Apache Redirect
or RedirectMatch
directives, but you can construct more complicated
expressions in PHP. You can easily redirect multiple old URLs to the same new
URL:
<?php
$rev_map = array('new.html' =>
array('/old-1.html',
'/old-2.html',
'/old-3.html'));
foreach ($rev_map as $new => $ar) {
foreach ($ar as $old) {
$map[$old] = $new;
}
}
if (isset($map[$_SERVER['REDIRECT_URL']])) {
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$map[$_SERVER['REDIRECT_URL']];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
You can look up the new URLs to which the old ones map in a database:
<?php
mysql_connect('localhost','user','password');
mysql_select_db('pages');
// escape quotes and SQL wildcards from the old URL
$old_page = mysql_real_escape_string($_SERVER['REDIRECT_URL']);
$old_page = strtr($old_page,array('_' => '\_',
'%' => '\%'));
$r = mysql_query("SELECT new FROM pages
WHERE old LIKE '$old_page'");
if (mysql_numrows($r) == 1) {
$ob = mysql_fetch_object($r);
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] . $ob->new;
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
If you need to use values from $_SERVER['REDIRECT_QUERY_STRING']
into variables to determine the new URL, parse the query string with parse_str()
.
If $_SERVER['REDIRECT_QUERY_STRING']
is artist=weird+al&album=dare+to+be+stupid,
then parse_str($_SERVER['REDIRECT_QUERY_STRING'],$vars)
sets $vars['artist']
to "weird al" and $vars['album']
to "dare to be stupid".
You can even use the error document to make a simple caching system. If a page isn't found, get its contents from your database and write them to disk. Then, redirect the user to the same URL they just asked for. Since the page now exists, they'll get it, and not the error page:
<?php
mysql_connect('localhost','user','password');
mysql_select_db('pages');
// escape quotes and SQL wildcards from the old URL
$url = mysql_real_escape_string($_SERVER['REDIRECT_URL']);
$url = strtr($url,array('_' => '\_',
'%' => '\%'));
// look for the page in the database
$r = mysql_query("SELECT page FROM pages
WHERE url LIKE '$url'");
if (mysql_numrows($r) == 1) {
$ob = mysql_fetch_object($r);
if ($fp = fopen($_SERVER['DOCUMENT_ROOT'] .
$_SERVER['REDIRECT_URL'],'w')) {
// write the page to disk
fwrite($fp,$ob->page);
fclose($fp);
// send the user back to the same URL
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$_SERVER['REDIRECT_URL'];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
// couldn't generate the page
print "This page is really not found.";
}
} else {
// couldn't find the page in the database
print "This page is really not found.";
}
?>
In this example, the entire contents of a page are stored in
the page column of the pages table and are written to a file with fwrite()
.
You could do more interesting or complicated things when generating a page, like
pull multiple pieces of the page from different places or populate a template
with dynamic data. However you generate the page, publishing a new version of
it is easy. Just update the database and delete the file from disk. The next time
a user asks for that page, it won't be found. The error-handling page will load
the updated page (or its components) from the database and write the new version
to a file.
If you're sending a user to a new PHP page, it's important to use a redirect
instead of just loading the page with include()
. The error page doesn't
have GET
or POST
variables set, and some server variables
are different (for example, $_SERVER['PHP_SELF']
points to the error
page, not the original URL.) If you're sending the user to a static page, however,
including content without a redirect can be useful. You can use an error-handling
page to provide access to a library of files without keeping the files under the
web server document root, for example:
<?php $file_root = '/usr/local/songs/'; $song = strtolower($_SERVER['REDIRECT_URL']); $song_file = realpath($file_root . substr($song,1,1) . "/$song.mp3"); if (preg_match("{^$file_root}",$song_file) && is_readable($song_file)) { header('Status: 200 Found'); header('Content-type: audio/mpeg'); header('Content-disposition: attachment; filename=' . $song . '.mp3'); readfile($song_file); } else { print "Unknown song."; } ?>
If this error-handling page is set up for the root directory of http://www.example.com/, asking for http://www.example.com/EatIt sends you the file /usr/local/songs/e/eatit.mp3, if that file exists. Checking to see whether the output ofrealpath()
begins with$file_root
prevents a user from passing directory-changing strings like "/../" in the URL. If a file is found, the page sends the right status code and headers to tell the user that they're getting an MP3 file and then sends the contents of the song file.
The error-handling page doesn't just have to find a new page to send to users. It can notify the webmaster that a page is missing. You can use this to find out if your own site has bad links to itself:
if (preg_match('{^http(s)?://'.$_SERVER['HTTP_HOST'].'}',
$_SERVER['HTTP_REFERER'])) {
ob_start();
print_r($_SERVER);
$data = ob_get_contents();
ob_end_clean();
mail($_SERVER['SERVER_ADMIN'],
'Page Not Found: '.$_SERVER['REDIRECT_URL'],
$data);
}
The preg_match()
statement finds referrer
URLs that are on the same host as the current request by comparing the beginning
of the referring URL to the $_SERVER['HTTP_HOST']
. If they match,
the output of print_r($_SERVER)
is stored in $data
using
output buffering:
ob_start()
tells PHP to capture output in a buffer instead of
printing it. ob_get_contents()
returns the contents of that buffer.ob_end_clean()
turns off output buffering without printing the
buffer. The mail()
function sends a message to the server administrator.
The body of the message (all the $_SERVER
variables in $data
)
contains the referring URL and other information that you can use to fix the page
with the bad link.